Change Image
Change Image
​
(Two
Challenging
Problems)
​
Evaluation
of
Statistical
and
Machine
Learning
Systems
​
Olivier
Binette
​
Duke
University
/
American
Institutes
for
Research
​
JSM
2022
​
Washington,
DC
​
August
10,
2022
​
olivierbinette.ca
​
Overview
​
2
​
olivierbinette
.ca
​
August 10, 2022
​
Two
challenging
evaluation
problems:
1.
the
reliability
of
multiple
systems
estimation
,
and
2.
the
accuracy
of
entity
resolution
algorithms.
​
​
Where
and
what
is
our
science
of
statistical
evaluation?
​
(it
often
seems
fragmented
or
neglected
in
favor
of
modeling)
​
Systematic
assessment
of
a
model's
performance
and
properties
for
the
purpose
of:
1.
choosing
the
best
model,
2.
using
models
appropriately,
and
3.
understanding
real-world
effects.
​
Evaluation
studies
need
to
answer
specific
questions
using
appropriate
methodology.
​
3
​
What
is
Evaluation?
​
August 10, 2022
​
olivierbinette.ca
​
1.
Reliability
of
Multiple
Systems
Estimation
​
4
​
August 10, 2022
​
olivierbinette.ca
​
How
many
victims
of
human
trafficking?
•
Victims
are
hidden
and
hard
to
reach.
•
Organizations
like
the
police
and
NGOs
only
reach
a
small
proportion
of
the
victims.
​
​
How
can
we
get
a
representative
picture?
•
​
5
​
The
Problem
​
August 10, 2022
​
olivierbinette.ca
​
How
it
works:
•
Integrate
data
(observed
victims)
from
multiple
sources
through
record
linkage.
•
Perform
a
missing
data
analysis
to
estimate
the
number
of
unobserved
victims.
​
6
​
Multiple
Systems
Estimation
(
MSE
)
​
police
​
...
​
NGO
​
...
​
Public
​
...
​
Integrated
data
​
...
​
Number
of
observed
victims
​
Number
of
unobserved
victims
​
olivierbinette.ca
​
August 10, 2022
​
Contentious
question.
•
200
years
of
controversy!
•
No
ground
truth
to
check
results.
​
It's
all
about:
•
Missing
data
assumptions
•
Data
sufficiency
and
robustness
•
Inductive
biases
​
7
​
Does
MSE
Work?
​
olivierbinette.ca
​
August 10, 2022
​
Drop
simulation
studies
that
can
give
any
result
you
like.
​
Instead:
1.
Perform
sensitivity
analyses.
2.
Dig
through
data
for
pseudo
ground
truths.
3.
Quantify
the
consequences
of
model
assumptions.
4.
Generate
visual
&
meaningful
assessments
of
robustness
.
5.
​
8
​
Our
Evaluation
Proposal
​
August 10, 2022
​
olivierbinette.ca
​
https://github.com/
OlivierBinette
/
MSETools
​
I
don't
think
we've
closed
the
discussion,
but
these
evaluation
tools
provide
significant
practical
instights
.
​
I
wish
I
had
known
more
about
the
science
of
evaluation
when
going
into
this
project.
•
I
feel
like
this
science
is
or
has
been
neglected
.
What
do
you
think?
​
9
​
Conclusion
Regarding
MSE
​
August 10, 2022
​
olivierbinette.ca
​
2.
Evaluation
of
Entity
Resolution
Algorithms
​
10
​
olivierbinette.ca
​
August 10, 2022
​
11
​
Inventor
Disambiguation
at
PatentsView
.org
​
olivierbinette.ca
​
August 10, 2022
​
Are
they
the
same?
​
Goal
:
•
Cluster
inventor
mentions
that
refer
to
the
same
real-world
person
.
​
Evaluation
metrics:
•
Precision
and
recall
​
Benchmark
datasets:
•
Hand-disambiguated
subsets
of
the
data
​
​
12
​
Inventor
Disambiguation
​
August 10, 2022
​
olivierbinette.ca
​
13
​
Evaluation
should
be
straightforward...
right!?
​
August 10, 2022
​
olivierbinette.ca
​
We
proposed
new
methodology
for
unbiased
performance
estimation
based
on
sampling
ground
truth
clusters
.
•
Re
presentative
performance
estimates
for
the
first
time
at
PatentsView
.org
•
More
cost-effective
and
practical
(for
PatentsView
)
than
sampling
record
pairs
or
other
approaches.
​
14
​
Evaluation
is
not
straightforward.
​
olivierbinette.ca
​
August 10, 2022
​
15
​
Usage
​
https://github.com/
PatentsView
/
PatentsView
-Evaluation
​
Leave
a
star!
​
August 10, 2022
​
olivierbinette.ca
​
16
​
PatentsView
is
releasing
new
data!
​
New
training
data
(n
>
150,000)
to
support
methodological
research
​
olivier
@
olivierbinette
.ca
​
Conclusion
​
17
​
August 10, 2022
​
olivierbinette.ca
•
Evaluation
is
often
not
straightfoward
.
•
It
is
often
neglected.
•
We
need
to
value
it
more
.
•
Where
is
our
science
of
evaluation?
Help
me
find
it!
​
I
want
to
hear
your
stories
and
thoughts.
​
olivier
@
olivierbinette
.
ca
​
18
​
Concluding
Thoughts
​
olivierbinette.ca
​
August 10, 2022
​
19
​
Papers
​
olivierbinette.ca
​
August 10, 2022
​
Soon
on
arxiv.
A
vailable
on
my
website
​
arXiv
:
2112.01594
​
Funding:
•
American
Institutes
for
Research
(
USPTO
)
•
NSERC
Canada
Graduate
Scholarship
•
NSF
CAREER
Award
(
Rebecca
Steorts
)
•
ASA
Travel
award
•
Github
sponsors
(individual
contributors)
•
G-Research
PhD
grant
​
20
​
Thank
you!
​
August 10, 2022
​
olivierbinette.ca
0
Fullscreen